当前位置: 首页 > 医学版 > 期刊论文 > 基础医学 > 分子生物学进展 > 2004年 > 第1期 > 正文
编号:11259301
Site-Specific Recombination Links the Evolution of P2-like Coliphages and Pathogenic Enterobacteria
     Department of Genetics, University of Stockholm, Stockholm, Sweden

    E-mail: anders.nilsson@genetics.su.se.

    Abstract

    The genome of the tailed temperate coliphage P2 (Myoviridae) contains some genes that probably are horizontally transferred additions to the genome. One of these genes, the Z/fun gene, was recently found intact in the genome of Neisseria meningitidis. We have investigated the presence of P2-like phages, and the genetic variation at the position corresponding to the phage P2 Z/fun locus, in the Escherichia coli reference collection (ECOR). P2-like phages are common in E. coli since they are present in about 30% of the ECOR strains. Hybridizations and PCR amplifications indicate that the overall variation among these phages is small. Amplification of the region corresponding to the phage P2 Z/fun locus in 11 prophages revealed that this is a multivariable locus. Sequencing of the region resulted in 10 completely different sequences but with a similar high AT-content as the Z/fun gene. All sequences contained at least one open reading frame with good transcription and translation signals. All sequences were also surrounded by a highly similar, previously undiscovered, inverted repeat (IR). We also found this IR in genetically unstable regions in pathogenic enterobacteria. This demonstrates that P2-like phages are important factors in the evolution of bacteria, not only because they carry a diversity of lysogenic conversion genes but also because they can act as vectors for single genes. The genes found between the IRs have unknown functions, and only a few clearly similar genes have been found in other bacteria.

    Key Words: bacteriophage P2 ? enterobacteria ? site-specific recombination ? lysogenic conversion ? virulence factors

    Introduction

    TOP

    Abstract

    Introduction

    Materials and Methods

    Results

    Discussion

    Acknowledgements

    Literature Cited

    The genome of temperate Escherichia coli phage P2 contains three regions, which are thought to be more or less recent horizontally transferred additions to the genome. These regions contain sequences with a higher AT-content than the rest of the genome and genes that use some rare codons compared with the host. The four genes in these regions, orf30, Z/fun, old, and tin are nonessential, constitutively transcribed, and not present in other P2-like phages (i.e., 186, CTX, HP1, HP2, Fels2, K139, or SopE) that exhibit a high degree of similarity to P2. These P2-like phages, or prophages, have either acquired different genes to some of these regions or lack genes at equivalent locations. In phage P2, three of these genes have been shown to encode proteins that cause advantageous lysogenic conversions of the host; Old obstructs the multiplication of lambdoid phages, and Tin makes the host refractory to T-even phages. The third protein, Z/Fun, so called because it once was believed to be two different proteins, is pleiotropic. Earlier investigations have shown that it inhibits plaque formation of phage T5 and makes the host sensitive to 5-fluorodeoxyuridine and 5-fluorouracil (Calendar et al. 1998). Different mutations in the gene can change the appearance of the plaques or increase the sensitivity to an unknown cell-density dependent cytotoxin of the host (Bertani 1976, 1978). Prophages Fels2 and SopE have different genes at the same location as Z/fun in P2, but the remaining phage or prophage genomes sequenced so far have nothing at all at this location. Phage 186, for example, has no gene in this position. There are approximately 100 nucleotides between the end of orf 45 and the start of gene J (genes G and FI in P2). This sequence only contains the V–orf 45 transcript terminator and the J promoters.

    Until recently, Z/Fun did not show similarity to any other protein in the databases, but genome sequencing projects have revealed that this protein can be found in Neisseria meningitidis (similarly called FunZ, 531 amino acids) with over 58% identity (Tettelin et al. 2000), as well as in the Bacillus anthracis plasmid pXO2 (pXO2-71, 517 amino acids) with about 20% identity (Okinaka et al. 1999) (table 1). These organisms are not closely related to E. coli, the standard P2 host, so the distribution of this gene is a fine illustration of the view that bacteria, plasmids, and phages are all part of the same coevolving system (Hendrix et al. 2000). The high similarity of Z/Fun especially to FunZ in N. meningitidis and the fact that funZ is not located in a prophage implies additional and more general functions of this protein.

    Table 1 Data of Inserted DNA Sequences and Similarity to Other Proteins.

    Acquisition of new genes to the genome, for example, genes conferring resistance against antibiotics or increased pathogenicity, is central for the formation of new bacterial strains (Lawrence and Ochman 1998; Campbell 2000), and conjugative plasmids and transducing phages are probably fundamental for such bacterial microevolution (Wren 2000). However, the transfer of genes involves a large number of other possible vectors and mechanisms at all stages of the process. Some nontransducing phages are also important suppliers of new genes and could even possess capable recombination systems (Wagner and Waldor 2002), but any major role involving phage P2 has not yet been demonstrated.

    In this paper, we report the distribution and characterization of P2-like prophages among the 72 strains in the E. coli reference collection, ECOR (Ochman and Selander 1984), and describe the large genetic variation at the position analogous to the phage P2 Z/fun locus in these phages. We also report that this variation, and corresponding variation in pathogenic bacteria and plasmids, is mediated via the same mechanism, which implies that P2-like phages can serve as a gene supply, and vector between strains, for genes affecting bacterial virulence.

    Materials and Methods

    TOP

    Abstract

    Introduction

    Materials and Methods

    Results

    Discussion

    Acknowledgements

    Literature Cited

    Strategy for Detection of P2-like Phages in the ECOR Collection

    Several independent experiments (i.e., hybridizations and PCR amplifications) were used for detection of P2-like prophages to assure the integrity of the observations. The presence of P2-like prophages in the ECOR strains was first detected by a DNA dot-blot hybridization of all strains against a probe of P2 full-length genomic DNA. Strains with a positive signal were hybridized to three additional probes, to check for region variability. Apart from the full-length genomic probe, the hybridization probes were consisting of three separate PCR amplifications of the regions V, W, J, and I; H, G, and Z/fun; and ogr, int, C, cox, and orf 78. Secondly, specific genes not found in E. coli chromosomes were amplified by PCR. The late region of P2-like phages is highly conserved in prophages in different E. coli isolates (Nilsson and Hagg?rd-Ljungquist 2001). Thus, all 72 ECOR strains were also analyzed with respect to the presence of the P2 capsid scaffold gene, O, and strains with a positive hybridization signal against P2 DNA were analyzed for the capsid part of the late genes (O-X), the late gene activator ogr, and the Z/fun gene.

    Bacterial Strains and DNA Extraction

    All bacterial strains were grown overnight in Luria-Bertani (LB) media at 37°C. Chromosomal DNA from the ECOR strains and from the positive control, E. coli P2 lysogen strain C-117 (Bertani 1968), and the nonlysogenic E. coli negative control, strain C-1757 (Sunshine et al. 1971), was extracted with Qiagen Blood and Cell Culture DNA extraction kit. ECOR strains that had been shown to be contaminated or mixed up in some collections (Johnson et al. 2001) were acquired from an additional source.

    DNA-DNA Dot-Blot Hybridization

    Two milligrams of spectrophotometrically quantified DNA from each ECOR strain and from the positive and negative control strains was microfiltrated onto each of four Zeta-ProbeGT membranes with a Bio-Dot microfiltration apparatus (BioRad). The hybridization probes, specified above, were amplified directly from P2 DNA using Ready-to-Go PCR beads (Amersham Biosciences) and 18 to 26 nt oligonucleotide primers (DNA Technology, Aarhus, Denmark) designed from the nucleotide sequence of P2 (table 2). The PCR products were purified with Qiaquick PCR purification kit (Qiagen) and checked by gel electrophoresis on a 1% agarose gel, followed by staining with ethidium bromide and inspection under a UV lamp. Rediprime II random prime labeling system was used to label the probes with [-32P]dCTP (Amersham Biosciences). The hybridizations were carried out according to the protocol supplied by the manufacturer and followed by autoradiography.

    Table 2 Oligonucleotide Primers.

    PCR Amplification of Phage P2 Genes

    Presence of the P2 O gene, the late genes O-X, ogr, and Z/fun among the ECOR strains was determined by PCR amplification and analyzed as stated above. DNA from the P2 lysogen C-117 was used as a positive control. All amplification primers used are listed in table 2. After a 5-min denaturing step at 95°C, short sequences were amplified with the standard program: 30 cycles of 1 min at 95°C, 1 min at 55°C, and 2 min at 72°C. The sequence O-X, and particularly some of the sequences inserted between genes G and FI, required an extended extension time up to 4.30 min. Different combinations of primers were tried if a result was negative.

    DNA Sequencing of the Z-Region, Corresponding to the P2 Z/fun locus

    Sequencing of all different inserts between G and FI was initially done on PCR products, purified as described above, that were cloned into the BamHI site of pUC18. Almost all of the sequences were too long to be finished by sequencing from both ends with plasmid forward and reverse primers. Most of the remaining sequencing was done by primer walking and with PCR products as templates. Automated DNA sequencing was carried out with an ABI Prism 377 (PerkinElmer) or on an ALFexpressII (Amersham Biosciences). Additional sequencing; primer walking of inserts ECOR 30, 31, 46, 48, and 58; and contig assembly of these inserts were done by MWG-Biotech, Ebersberg, Germany. Other nucleotide contigs were assembled with ALFwin Sequence Analyzer, the GCG Fragment Assembly System (Genetics Computer Group 1999) or manually. The sequence data have been submitted to the EMBL databases under accession numbers AJ512675 to AJ512685.

    Sequence Analysis

    Most of the sequence analysis was accomplished with computer programs in the GCG package. Putative open reading frames (ORFs) in the inserts were found with FRAMES, translating all DNA sequences in all six frames with the bacterial translation table. Only proteins longer than 50 amino acids were considered during the search for ORFs. Promoters and ribosomal binding sites of ORFs were sought for with FINDPATTERNS. The program TESTCODE was also used to check the presence of coding genes. The program PEPTIDESORT was used for estimation of protein molecular weights and the program COMPOSITION was used for estimation of frequencies of single, di-, and trinucleotides. The expression and size of the presumed proteins were confirmed in coupled in vitro transcription/translation assays. To avoid read-trough from unwanted promoters, these were not carried out with sequences cloned into plasmids, but done with linear PCR fragments as templates. The assays were based on an E. coli S30 extract kit and performed and analyzed according to the protocol supplied by the manufacturer (Promega). MEME was used to search for conserved patterns in the proteins, but database searches for sequence motifs in the putative proteins were also done with MOTIFS, which compared the protein patterns with patterns defined in the PROSITE Dictionary of Protein Sites and Patterns.

    Secondary structures in the upstream and downstream inverted repeat regions were assessed with MFOLD, REPEAT, and STEMLOOP. The insertion sequence (IS) database at http://www-is.biotoul.fr/IS.html was used to search for IS-specific patterns in inverted repeat regions, as well as within inserts. Database similarity searches of coding regions were mainly done at the European Bioinformatics Institute Web site http://www.ebi.ac.uk/fasta33/index.html with FASTX3 (Pearson et al. 1997; Pearson 2000) against the SWALL (Swissprot and translated EMBL) databases. The program translates the nucleotide sequence in all six frames before performing the search and allows frameshifts, caused by sequencing errors, between codons. Searching a database of a certain size, the resulting expect score (E score) can be interpreted as the number of times an unrelated sequence, of the same length as the query and the hit sequence, would show a higher identity just by chance. A significant match indicating a true relationship has an E score at least below 0.05, which corresponds to an expectation of finding five unrelated sequences in a 100 searches. We also report similarities in the grey zone, 0.05 to 1.0, which may contain not only weakly similar proteins but also false positives. The Blast and tBlastX programs at http://www.ncbi.nlm.nih.gov/BLAST/ were used for complementary database searches.

    Phylogenetic Analysis

    Prophage, bacteria, and plasmid sequences on both sides of the different inserts (i.e., both inverted repeat regions) were concatenated to 150-nt sequences and aligned with ClustalX (Thompson, Higgins, and Gibson 1994). Since there was only small variation in most parts, only a few gaps in the sequences downstream of the boxed part of IRR (fig. 1) had to be introduced. The phylogenetic relationship was analyzed with PAUP* (Swofford 1999) under maximum-parsimony criteria and with the heuristic search option. The stepwise addition of taxa was randomized 10 times per run, keeping the two shortest trees each run. All other program parameters were set to default values (e.g., no character weighting or exclusion). The degree of confidence of the resulting shortest tree for each set was tested in bootstrap analyses, each with 100 replicates. The concatenated sequences matrix was analyzed both in four divisions—the unboxed part of IRL, the boxed part of IRL, the boxed part of IRR, and the remainder—and in all the 150 nucleotide characters together.

    FIG. 1. Alignment of the 50% consensus P2-like prophage sequence and corresponding bacterial sequences. Numbers between the left and right inverted repeats (IRL and IRR) are the length of the inserted sequences. Percentages below sequences are bacterial ungapped identities against the 50% consensus at the top. The boxed parts represent presumed site-specific recombination attL and attR sites

    Results

    P2-like Prophages in the ECOR Collection

    The hybridization against labeled full-length P2 DNA showed that 26 of the strains in the ECOR collection contained P2-like sequences, but six of these resulted in a considerably weaker signal (fig. 2). The following hybridizations against three different parts of the P2 genome showed detectable signals for only two of these six strains, ECOR17 and ECOR42. ECOR17 hybridized weakly with two subsets of P2 DNA as probes, the G, H, and Z/fun probe and the ogr, int, C, cox, and orf 78 probe. ECOR42 hybridized strongly with the P2 gene G, H, and Z/fun probe but not at all against the ogr, int, C, cox, and orf 78 probe. Apart from these two strains, the results were coherent through all hybridizations for the remaining 20 strains.

    FIG. 2. DNA-DNA dot-blot hybridization of full-length bacteriophage P2 DNA against the 72 strains of the Escherichia coli reference collection (ECOR). Each spot represents the individual hybridization result of the strain given by the sum of a row and a column number. The E. coli P2 lysogen C-117 was used as a positive control (+), and E. coli C-1757 was used as a negative control (–). Sequencing of the late genes of the prophage in ECOR43 revealed that it was identical to the one in ECOR44. The new ECOR43 from another source did not contain any P2-like prophage, so the result for this strain should be negative

    The analysis of some phage genes by PCR amplification showed that gene O follows the same pattern; it was only present in the same 20 strains. In addition, all of these 20 strains also contained ogr, the late gene transcriptional activator essential for production of phage particles. The O-X part of the late transcript was present in all except the ECOR10, 48, 61, 62, and 64 prophages. Given that they all had the O gene, this was probably due to sequence variation and poor annealing of the primer in the more variable X gene. Together with the result of the hybridizations, this confirmed that the ECOR strains contained not only 20 P2-like prophages (approximately 28%) but also a few defective prophages or relic phage genes.

    Sequence Variation in the Region Analogous to the P2 Z/fun Locus, the Z-Region

    PCR amplification of the region between P2 genes G and FI was only possible from 11 prophages of the 20 strains that contained P2-like phages. It was of little use to change primers; different combinations of six forward and four reverse primers were tried without success for the remaining nine prophages (table 2). The amplification of the ECOR42 prophage resulted in a fragment, but it was hard to get enough PCR products for cloning and sequencing. Only two of the 11 amplified fragments were of the same length, and no fragment was of the same length as the original P2 Z/fun locus (table 1). The subsequent DNA sequencing of the regions revealed that the large middle part of all sequences were different, with the exception of ECOR5 and 64 where the sequence variation was around 1%. The large difference between sequences made it obvious that this was no locus in the conventional meaning, but a multivariable site, which in the following will be called the Z-region.

    The heterogeneous middle part of the sequences was in all cases flanked by highly similar sequences at both sides with rather sharp boundaries to the dissimilar part of the sequences (fig. 3). Brought together, the flanking sequences can form a complex single-stranded DNA structure dominated by a long, imperfect inverted repeat sequence (fig. 4). There was also two more inverted repeats within this long inverted repeat and many short, direct repeats spread all over the structure. At the 5' end there was a stemloop containing the stop codon of gene G, and at the 3' end were several inverted repeats forming three stemloops including the promoter regions and the ribosomal binding site of the FI gene (fig. 4).

    FIG. 3. Alignment of P2-like prophage DNA sequences from different strains in the Escherichia coli reference collection (ECOR). The first alignment begins 30 nt upstream of the end of gene G and ends with the first 20 bp of the different inserts. The second alignment begins with the inserts and ends 95 nt before the start of gene FI. The boxed part represents the most conserved 33 bp in prophages and bacteria, believed to be part of the att core site. Inverted repeats are marked with a solid line arrow and stemloops are marked with a dashed line arrow. Deletions are marked with a dotted line. A dash in consensus sequences means that there was no majority for a particular nucleotide at the different levels of consensus

    FIG. 4. Illustration of inverted repeats and stemloops in the conserved parts of the Z-region of P2-like prophages in the ECOR strains. The DNA sequence is the 50% consensus of 12 prophages and starts 30 nt before the end of gene G and ends 30 nt after the start of gene FI. Arrows indicate additional inverted repeats within the major inverted repeat

    Analysis of Inserted Sequences and Open Reading Frames

    Apart from the obviously homologous insertions in ECOR5 and 64, the inserts did not fall into discrete size classes (table 1). The inserted sequences varied between 843 and 3,661 bp and were all AT-rich (59% to 68%). The search for ORFs resulted in at least one ORF per insert, but longer inserts typically had at least two. In general, A is a more frequent nucleotide in the coding strand of genes than T, and these ORFs were no exception. It was, in retrospect, possible to predict the coding strand from the A/T ratio. Furthermore, the trinucleotide AAA was more frequent in six of the 12 inserts than expected from the frequency of observed A nucleotides (2-tests resulting in P < 0.05). Because of the AT-richness, it was not difficult to find good transcription and translation signals for the predicted ORFs (table 3), and the in vitro coupled transcription/translation assays confirmed the expression of a protein for six of the templates (fig. 5). Since the templates are generated by PCR using primers at the end of gene G and the start of gene FI, the promoter activities must be contained within the fragments. It is noteworthy that gene Z/fun, which lacks a detectable promoter sequence, is well expressed. Some fragments generated from other strains did not give an in vitro protein product, but this might, for example, depend on the lack of superhelicity of the substrate or on secondary structures formed covering either transcription or translation signals.

    Table 3 Transcription and Translation Data of Open Reading Frames.

    FIG. 5. Autoradiograph of an SDS-PAGE gel analysis of proteins encoded by sequences inserted in the Z-region of prophages in the ECOR collection strains. The proteins were expressed with an in vitro transcription/translation assay system utilizing PCR-generated templates of the entire Z-regions. ECOR strain numbers are specified above lanes. The lane marked M shows the protein size ladder (Rainbow marker, Amersham Biosciences). The top marker band is 97 kDa, followed by 66, 45, 30, 20.1, and 14.3 kDa. The rightmost lane shows the phage P2 Z/Fun positive control protein

    Database searches for proteins similar to the 16 putative proteins showed that the ORFs were related to a variety of bacterial proteins, but when a high degree of similarity was found, it was always to a protein with unknown function, often located on a plasmid (table 1). The similar proteins were distributed in many bacterial families, not just enterobacteria. The nucleotide sequences were used in complementary searches, with tBlastX against nucleotide databases, in which nucleotide sequences are translated to amino acid sequences for greater accuracy. In addition to the protein similarities shown in table 1, there were two strong indications of similarity with nonsense proteins (e.g., proteins inferred from intergenic regions or from alternate reading frames). Orf 9 in the ECOR46 prophage showed greater similarity to an intergenic part of the Salmonella enterica serovar Typhi CT18 plasmid pHCM2 (accession number AL513384 [Parkhill et al. 2001]) than to the complete protein in the Rhizobium etli plasmid (table 1), but using three different reading frames. The noncoding part of the ECOR53 insert was similar to a nonannotated short sequence from Staphylococcus epidermis (accession number AF269322).

    All putative proteins in table 1 were subjected to searches for known amino acid patterns that could suggest a function, but there was no clear indication of any such motif or pattern in any sequence.

    The noncoding sequence of the insert in phage P2 included the left end part of an insertion sequence, IS630, starting 53 nt from the Z/fun stop codon and ending 11 nt from the left inverted repeat (IRL). The partial IS was also found in many other enterobacteria, the most similar sequence was the enterohaemorrhagic E. coli (EHEC) O157:H7 Sakai strain (Hayashi et al. 2001), which showed 64% identity in a 255 nt overlap. The P2 insert also contained the start of a transposase gene, with the first 12 amino acids being 75% identical to the Z4330 gene in O-island #122 of the other sequenced EHEC, O157:H7 EDL933 (Perna et al. 2001). However, in the P2 insert, the gene is terminated by a single base pair deletion frameshift after 20 amino acids. There was also an 87-nt fragment of a gene similar to the E. coli TnpA transposase gene between ORFs in the ECOR46 prophage insert, but with a translational frameshift after the first 11 amino acids.

    Analysis of Inverted Repeats

    The inverted repeats on each side of the inserts (IRL and IRR) were quite conserved in all prophages (over 80% identity), but the sequence variation increased slightly at the border to the inserts. Three prophages had a 12-bp deletion in the stemloop in IRL, and the ECOR45 prophage lacked the 5' part of IRR (fig. 3). The two longest inserts, in ECOR 46 and 48, had additional imperfect IRs within noncoding parts of the inserts. ECOR46 had an extra IRL in between orf 8 and orf 9, and ECOR48 in between orf 11 and orf 12, in both cases dividing ORFs on different coding strands (table 1). Searches, with the prophage consensus IRs as query sequences, in databases did not indicate that any part of the IR sequences were part of IS-elements or carried integron signatures. The IRs of the Z-region was however found in bacteria, and extended searches revealed that it was confined to E. coli and Salmonella chromosomes, and to enterobacterial plasmids (fig. 1). The inverted repeats in bacteria were in most cases more similar to the 33-nt ends (boxed in figure 1) of the prophage IRs than to the beginning. Most of these bacterial IRs were located between genes with unknown function and obscure origin, but several of them were located either in pathogenicity-associated islands (PAIs) or other regions containing genes encoding virulence factor genes.

    We found different inserts surrounded by the same IRs not only in the uropathogenic E. coli CFT073 PAI II (Rasko et al. 2001), the O-island #172 of E. coli O157:H7 EDL933, and the multiple-drug-resistant Salmonella Typhi CT18 but also in E. coli K-12 MG1655 (Blattner et al. 1997) and Salmonella enterica serovar Typhimurium LT2 (McClelland et al. 2001) (table 1). The sequences outside the IRs were different in most cases. One of the two Z-regions in Salmonella Typhi CT18 (in AL627272) and the Z-region in Salmonella Typhimurium LT2 were clearly located in the same sequence context, but CT18 had an extra gene immediately upstream of the region. The largest insert of all (>8 kb) was found within similar IRs in the Proteus vulgaris conjugative plasmid Rts1 (Murata et al. 2002) (table 1). Half of the plasmid genome consists of a large duplicated segment, M1a and M1b. The insert was present only in M1a, which means that it most likely became inserted after the duplication took place.

    The search for IRs resulted for some genomes in the discovery of only one of the IRs. There was a 70% identical 66-nt IRL in a 450-nt sequenced fragment of the Klebsiella pneumoniae plasmid SL038 (accession number AJ276856) and a 65% identical IRL, of the same size, in Shigella flexneri virulence plasmid WR100 (accession number AL391753 [Buchrieser et al. 2000]). The left IR was also found in EHEC O157:H7, the Sakai strain, in a region containing the same genes as EDL933. The start of the gene following the IRL, coding for the first 31 amino acids, was missing in the Sakai strain as compared with EDL933, which implied that the beginning of the gene and a right IR may have been deleted at the same time.

    Particular attention was paid to examining the sequences surrounding the funZ gene in Neisseria meningitidis to find the same conserved inverted repeat motifs, but here the 33-bp part of IRs (boxed in figure 1) closest to the insert were missing, and instead there was 68% identity to a 32-bp sequence immediately adjacent to the boxed part of both IRs. However, there was also a 92% identical 155-bp end of the ISNme1, a variant of IS1106 in the IS5 family, only 159 bp upstream of the end of funZ. Prophages Fels2 and SopE both contain genes at a position corresponding to the P2 Z/fun gene, but these genes appear to have been inserted by some other mechanism since we did not find IRs surrounding any of them.

    The phylogenetic analysis of the relationship between all concatenated IR sequences showed that prophages are more related to each other than to bacteria, even if the bootstrap analysis resulted in poor support for many nodes (fig. 6). The trees generated in separate analyses of parts of the alignment had even lower bootstrap values but did not indicate a different evolutionary history for any part. The number of homoplasious characters (e.g., nucleotide characters that spoke for an alternate tree) was about the same in all parts of the alignment. The homoplasy was also comparable to what has been shown in a phylogenetic analysis of the late genes in different P2-like phages, where it was concluded to be caused by homologous recombination (Nilsson and Hagg?rd-Ljungquist 2001).

    FIG. 6. Unrooted phylogenetic tree illustrating the relationship between the concatenated upstream and downstream inverted repeats found in prophage, bacteria, and plasmid genomes. Numbers on branches are bootstrap consensus values, but only values above 50% are represented in the tree. Branches supported in less than 50% of the trees are collapsed to a common node. The tree was constructed with PAUP* version 4.0b10 (Swofford 1999)

    Discussion

    According to hybridizations and PCR amplifications, P2-like prophages are both frequent and remarkably similar in E. coli isolates. It is also apparent that the Z/fun gene is absent or rare among P2-like phages, and they instead contain totally unrelated genes at the same locus. It is probable that this multivariable site, named the Z-region, is a regular lysogenic conversion locus containing different genes that partially defines the host. The fact that we did not find any housekeeping genes in this region but genes coding for "hypothetical" or "putative" proteins, speaks in favor of this hypothesis. The genes are highly diverse and appear to come from an AT-rich source of unknown genes. The sequences inserted into prophages are bordered by inverted repeats, which we recurrently also found in the chromosomes of enterobacteria. Consequently, the definition of the Z-region must be extended to include these as well. Although similar genes or sequences are found in unrelated bacteria, we believe that the distribution of the inverted repeats limits the possible sources and suggest that this source is genetically unstable regions, including the pathogenicity-associated islands, in enterobacterial chromosomes. Even though some of these islands have a conserved part, often around 30 kb in size (Tauschek, Strugnell, and Robins-Browne 2002), they also have highly variable regions of different length. These islands are more rich in ISs and IS-related sequences than the rest of the genome, they regularly contain several transposase genes, and they are AT-rich, at least in enterobacteria. Comparing similar regions in different islands frequently reveals deletions and insertions, and it is common to observe one transposon partially superimposed on another. The site reported in this paper has probably not been spared; there are several half-sites and fragments of the flanking IRs in the databases.

    There are numerous ways to incorporate new DNA into a recipient chromosome, and many systems are based on identity between donor and recipient DNA. The appearance of the Z-region in the P2-like prophages, as can be seen in the alignment (fig. 3), is that extended and highly similar sequences abruptly change into completely unlike sequences with different AT-content. Although homologous recombination is known to occur between bacterial chromosomes, including prophage sequences, it is more likely that the observed insertions are the result of site-specific recombination events. At least two observations support this notion. The IRs are more conserved closer to the inserts (the boxed part in figure 5), and in a hypothetical empty site, there would be an approximately 200-nt sequence suitable for homologous recombination between two IRs, but the recombination leading to the insertion of foreign genes in the Z-region has always happened at a specific point. There are many systems able to perform site-specific recombination. It can be carried out by recombinases supplied either by transposable elements, plasmids, phages, or bacteria.

    The Z-regions found in bacteria either contain, or are frequently located in the vicinity of, sequences encoding transposase genes. This may indicate that this class of recombinases causes the movement of the inserts and that the observed IRs and inserts constitute transposons. However, we believe that this association is circumstantial and depends on the nature of the target sequences. There are also many other arguments against this hypothesis. Most transposases act in cis and must accordingly have been deleted from the majority of inserts. IS elements and transposons form classes characterized by similar IRs at the ends. They generally show a preference for an insertion target sequence and generate flanking direct repeats (DRs) when they integrate. Although some transposable elements insert randomly, we cannot find any class that fit our observations. The recognized classes are generally also spread over many species, or even genera, but we have not found any site outside the enterobacterial domain. There are over 200 flanking nucleotides that are part of IRs in the Z-region of the prophages. These flanking sequences are well conserved in the P2-like prophages as well as in enterobacteria, so it is difficult to assume that they are selectively neutral. In addition, the phage head can only package a limited amount of DNA, so junk DNA would consequently be deleted in the long run in favor of more useful genes. The strongest argument against a transposase-mediated transfer is maybe the degree of vertical inheritance or clonality of the IRs of the Z-region as revealed by the phylogenetic analysis. Clonality is not expected if the region is part of a transposable element. IRs are part of the transposable element and get inserted along with it. Vertical inheritance of the region implies that the IRs of the Z-region got inserted only once and that subsequent copies of the region have acquired mutations and followed their hosts as they have branched off into separate clones. If the IRs are part of a transposable element, it becomes hard to explain the totally different sequences within the inserts. There is no relationship between them, and they cannot possibly have originated from a common ancestor. It is more likely that they would have inserted into different clones at different times in such a case, and the distribution of the mutations in the IRs would suggest lateral transfer. It would also be likely to find some similar sequences within the inserts if the IRs are part of a transposable element.

    In the phylogenetic analysis, the bootstrap value for the branch leading to prophages is somewhat low (67%), but the analysis indicates that the IRs of the Z-region are of monophyletic origin and have not been recently transferred between prophages and bacteria. The placing of the region may well have happened a long time ago, and together with the assumption that unnecessary sequences are trimmed away, it implies that the entire flanking regions are needed, possibly as recognition sites in still another site-specific recombination system. The site contains enough similar neighboring sequences, and the sequences downstream of the insertion point have additional direct and inverted repeats, to meet the general prerequisite for systems used by many site-specific recombinases other than transposases. The recombination occurs between two identical, or nearly identical, attachment sites, att, and it is common that several molecules of a recombinase need to bind to these sites, and in more complex systems, also to upstream and downstream arm regions in the donor DNA. In many cases, a complete recombination reaction cannot come about without additional factors, which also bind more or less close to the att site of the donor DNA. We believe that the most conserved part of the two observed inverted repeats, the boxed part of IRL and IRR in figure 5, corresponds to the attL and attR formed as a result of a site-specific recombination event between the cores of two att sites, often designated attP and attB. Many observations at the Z-region are consistent with this hypothesis. There are additional IRs within the att site, which is a common characteristic of the core region of phage integrase attP sites (Campbell 1992) (fig. 4). There are also extra att sites between ORFs in ECOR46 and 48 that are easily explained by a secondary integration using either attL or attR as integration site. Recombination between two flanking attL and attR sites can also generate an inversion of the inserted region. Although the most conserved 33-bp part of IRL and IRR should be similar to the core att site, it is impossible to determine the exact position. In most cases, the presumed att core site of IRs of prophages consists of an 8-nt inverted repeat on each side of an 8-nt overlap region (fig. 4), but the IR of the core site of ECOR58 is extended to 13 nt (fig. 1). Apparently, there is some sequence variation, and the size and position of the site may have to be adjusted when more data on this system becomes available. Finding an att site without insert would help and may point at a specific class of recombinases.

    There is no lack of candidate recombinases. There is for instance probably one, a recombinase similar to XerC, present in the E.coli CFT078 insert (table 1). Also, there are presumably many capable recombinase systems in enterobacterial hosts and probably over a dozen distinctly different temperate phage families, each with numerous phage variants, carrying integrases that could do the job. Most of them have unknown att sites and integration mechanisms. These two examples, either a stationary recombinase supplied by the host or an invading extrachromosomal element, may represent different ideas about the recombination system.

    Due to the nature and complexity of bacterial evolution, it is difficult to assess the importance of this mechanism. The evolutionary time-space is full of possible vectors and horizontal transfer mechanisms, and the mechanism presented here is maybe just another one. Phages play an important part in the evolution of bacteria and can contribute with virulence factors (Miao and Miller 1999). Similar lysogenic conversion genes have also been found in prophages of unrelated commensal and pathogenic bacteria (Brüssow and Hendrix 2002). The mechanism, presented here, used for the addition of lysogenic conversion genes, appears to be a more controlled and specific mechanism than the sometimes random insertion of transposable elements or homologous recombination of horizontally transferred genes. The similarity between a Z-region in a P2-like phage and a Z-region in a bacterium means that P2-like phages can act as vectors for virulence factors. The region is not a strict lysogenic conversion locus since a bacterium can pick up a gene through site-specific recombination even without having to integrate a whole phage genome. A homologous recombination event between IRs of two Z-regions located in phages, plasmids, and/or bacteria is probably also possible with regular systems, for example, the Red system, which only requires short homologous sequences (Poteete 2001).

    The importance of this site-specific recombination system is also dependent of how widespread the Z-region is. There is a sequence bias for pathogens in the large bacterial genome projects, and it is therefore possible that this site-specific recombination system is more of a general lysogenic conversion system rather than aimed at pathogenic enterobacteria. The IRs of the Z-region is also present in nonpathogens, such as the slightly different IRs in E. coli K-12, but always present in genetically unstable regions. This question and questions about the age and evolution of the site, the effectiveness and frequency of insertion, the enzymes necessary for recombination and other factors needed, and the function of the inserted genes, will have to be addressed in future investigations.

    Acknowledgements

    We would like to thank H. Ochman for supplying the Escherichia coli reference collection (ECOR). Complementary ECOR strains were kindly supplied by D. Hughes. We also thank C. Delaloy, Erasmus student in our lab during the 2001 spring semester, for cloning and initial sequencing, V. Bouchet for discussions and comments on the manuscript, and D. Mazel for supplying information about integrons. The work was supported by grants from The Swedish Research Council and from the Erik Philip-S?rensen Foundation.

    Literature Cited

    Bertani, L. E. 1968. Abortive induction of bacteriophage P2. Virology 36:87-103.

    Bertani, L. E. 1976. Characterization of clear mutants belonging to the Z gene of bacteriophage P2. Virology 71:85-96.

    Bertani, L. E. 1978. Cold-sensitive mutations in the Z gene of prophage P2 that result in increased sensitivity of the lysogens to a low molecular weight product of the host bacteria. Mol. Gen. Genet. 166:85-90.

    Blattner, F. R., and G. Plunkett, III, C. A. Bloch et al. (14 co-authors). 1997. The complete genome sequence of Escherichia coli K-12. Science 277:1453-1474.

    Brüssow, H., and R. W. Hendrix. 2002. Phage genomics: small is beautiful. Cell 108:13-16.

    Buchrieser, C., P. Glaser, C. Rusniok, H. Nedjari, H. D'Hauteville, F. Kunst, P. Sansonetti, and C. Parsot. 2000. The virulence plasmid pWR100 and the repertoire of proteins secreted by the type III secretion apparatus of Shigella flexneri. Mol. Microbiol. 38:760-771.

    Calendar, R., S. Yu, H. Myung, V. Barreiro, R. Odegrip, K. Carlson, L. Davenport, G. Mosig, G. E. Christie, and E. Hagg?rd-Ljungquist. 1998. The lysogenic conversion genes of coliphage P2 have unusually high AT content. Pp. 241–252 in M. Syvanen and C. I. Kado, eds. Horizontal gene transfer. Chapman & Hall, London.

    Campbell, A. M. 1992. Chromosomal insertion sites for phages and plasmids. J. Bacteriol. 174:7495-7499.

    Campbell, A. M. 2000. Lateral gene transfer in prokaryotes. Theor. Pop. Biol. 57:71-77.

    Genetics Computer Group. 1999. Accelrys Inc. San Diego, California.

    Hayashi, T., and K. Makino, M. Ohnishi et al. (19 co-authors). 2001. Complete genome sequence of enterohaemorrhagic Escherichia coli O157:H7 and genomic comparison with a laboratory strain K-12. DNA Res. 8:11-22.

    Hendrix, R. W., J. G. Lawrence, G. F. Hatfull, and S. Casjens. 2000. The origins and ongoing evolution of viruses. Trends Microbiol. 8:504-508.

    Johnson, J. R., P. Delavari, A. L. Stell, G. Prats, U. Carlino, and T. A. Russo. 2001. Integrity of archival strain collections: the ECOR collection. ASM News 67:288-289.

    Lawrence, J. G., and H. Ochman. 1998. Molecular archaeology of the Escherichia coli genome. Proc. Natl. Acad. Sci. USA 95:9413-9417.

    McClelland, M., and K. E. Sanderson, J. Spieth et al. (23 co-authors). 2001. Complete genome sequence of Salmonella enterica serovar Typhimurium LT2. Nature 413:852-856.

    Miao, E. A., and S. I. Miller. 1999. Bacteriophages in the evolution of pathogen-host interactions. Proc. Natl. Acad. Sci. USA 96:9452-9454.

    Murata, T., and M. Ohnishi, T. Ara et al. (13 co-authors). 2002. Complete nucleotide sequence of plasmid Rts1: implications for evolution of large plasmid genomes. J. Bacteriol. 184:3194-3202.

    Nilsson, A. S., and E. Hagg?rd-Ljungquist. 2001. Detection of homologous recombination among bacteriophage P2 relatives. Mol. Phylogenet. Evol. 21:259-269.

    Ochman, H., and R. K. Selander. 1984. Standard reference strains of Escherichia coli from natural populations. J. Bacteriol. 157:690-693.

    Okinaka, R., and K. Cloud, O. Hampton et al. (11 co-authors). 1999. Sequence, assembly and analysis of pX01 and pX02. J. Appl. Microbiol. 87:261-262.

    Parkhill, J., and G. Dougan, K. D. James et al. (38 co-authors). 2001. Complete genome sequence of a multiple drug resistant Salmonella enterica serovar Typhi CT18. Nature 413:848-852.

    Pearson, W. R. 2000. Flexible sequence similarity searching with the FASTA3 program package. Methods Mol. Biol. 132:185-219.

    Pearson, W. R., T. Wood, Z. Zhang, and W. Miller. 1997. Comparison of DNA sequences with protein sequences. Genomics 46:24-36.

    Perna, N. T., and G. Plunkett, III, V. Burland et al. (25 co-authors). 2001. Genome sequence of enterohaemorrhagic Escherichia coli O157:H7. Nature 409:529-533.

    Poteete, A. R. 2001. What makes the bacteriophage lambda Red system useful for genetic engineering: molecular mechanism and biological function. FEMS Microbiol. Lett. 201:9-14.

    Rasko, D. A., J. A. Phillips, X. Li, and H. L. Mobley. 2001. Identification of DNA sequences from a second pathogenicity island of uropathogenic Escherichia coli CFT073: probes specific for uropathogenic populations. J. Infect. Dis. 184:1041-1049.

    Sunshine, M. G., M. Thorn, W. Gibbs, R. Calendar, and B. Kelly. 1971. P2 phage amber mutants: Characterization by use of a polarity supressor. Virology 46:691-702.

    Swofford, D. L. 1999. PAUP*: phylogenetic analysis using parsimony (*and other methods). Sinauer Associates, Sunderland, Mass.

    Tauschek, M., R. A. Strugnell, and R. M. Robins-Browne. 2002. Characterization and evidence of mobilization of the LEE pathogenicity island of rabbit-specific strains of enteropathogenic Escherichia coli. Mol. Microbiol. 44:1533-1550.

    Tettelin, H., and N. J. Saunders, J. Heidelberg et al. (39 co-authors). 2000. Complete genome sequence of Neisseria meningitidis serogroup B strain MC58. Science 287:1809-1815.

    Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673-4680.

    Wagner, P. L., and M. K. Waldor. 2002. Bacteriophage control of bacterial virulence. Infect. Immun. 70:3985-3993.

    Wren, B. W. 2000. Microbial genome analysis: Insights into virulence, host adaption and evolution. Nat. Rev. Genet. 1:30-39.(Anders S. Nilsson, Joakim)